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1. Applicantfs^ 
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which filed 



Identification of Inventorfs) 

Name(s) of person(s) believed 

bv Applican tsfs) to be the inventorfs) 
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Introduction 



5 The invention relates to data processing modelling systems having artificial neural 
networks. 

Applications of such systems include financial prediction and data mining for 
analysis of customer behaviour. 

10 

A neural network is a biologically inspired data processing system. A large number 
of different neural network architectures and training methods exist. By far the most 
popular neural network for time-series modeling is the back-propagation neural 
network. The back-propagation neural network is a multi-layered network of 
15 processing units and weights. The weights are parameters and together with the 
processing units (which are usually simple sigmoid transfer functions) they can 
model a process represented as an input data-set or "training set'\ The training 
process used to parameterize the network (i.e. find a good set of weights) is a variant 
of gradient descent. 

20 

At present, the generalisation error in such systems arises from bias, variance and 
noise variance. In applications such as financial prediction the signal-to-noise ratio is 
often low and thus the (unpredictable) noise variance component is high relative to 
the (predictable) bias and variance components. 

25 

It is therefore desirable to reduce bias and variance to achieve good modelling 
performance. 



30 



A number of ensemble techniques have been proposed to reduce variance in neural 
network regression models. There are different ensemble techniques, and the most 
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popular include some elaboration of ''bagging'' or ''boosting''. The basic principle of 
these techniques is to generate multiple versions of a predictor. When predictions 
from these versions are combined (averaged for example), smoother more stable 
predictions are generated. When applied to neural networks, these techniques can 
5 yield dramatic improvements in prediction performance. This is because neural 
networks are inherently unstable i.e. small changes in training set and/or parameter 
selection can produce large changes in performance. Bagging is widely accepted as 
one of the most populeu: neural network ensemble techniques. It uses the **bootstrap 
technique", a very popular statistical re-sampling technique, to generate multiple 

10 training sets and networks for an ensemble. Each ensemble training set is the same 
size as the original training set, but given that the bootstrap samples data with 
replacement, individual training samples may appear several times in an ensemble 
training set while others may be left out. Outputs from the trained networks in a 
bagged ensemble are combined using a simple average to produce smoother, more 

1 5 stable predictions . 

Thus, while ensemble techniques correct for variance, little progress has been made 
to also correct bias, or to correct for sources of bias that are difficult to detect. 

20 It is therefore an object of the invention to provide a training method and a 
modelling system to provide correction of both bias and variance and/or to correct 
for sources of bias. Another object is for such a method and system to estimate 
average ensemble generalisation error. 

25 Statements of Invention 

According to the invention, there is provided a method of training a modelling 
system neural network, the method comprising the steps of: 



30 



building stages of bagged ensembles having training sets; and 
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after each stage adapting the training sets so that bias is identified and 
compensated for. 

5 In one embodiment, the method comprises the further steps of estimating 
generaUsation performance at each stage. 

In another embodiment, the training sets are generated by sampUng with 
replacement from an original training set. 

10 

In one embodiment, the step of building stages comprises 

calculating the number of training vectors in a training set; and 
15 setting up initial bootstrap training sets. 

In another embodiment, the step of adapting the training set comprises:- 
propagating training vectors through a trained ensemble; and 

20 

storing the ensemble response for each training vector. 
In a fixrther embodiment, said step comprises the further steps of:- 
25 combining ensembles across all stages trained so far, 

calculating the generalisation error; and 

continuing training while there are significable improvements in the 
30 generalisation error. 
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The invention also provides a modelling system whenever trained in a method as 
described above. 



5 Detailed Description of the Invention 

The invention will be more clearly understood from the following description of 
some embodiments thereof, given by way of example only with reference to Fig. 1 
which is a flow diagram illustrating operation of a modelling system for financial 
10 prediction. 

Referring to Fig. 1 operation of a modelling system of the invention is illustrated. 
The application in this embodiment is US dollar/Japanese Yen closing price 
prediction. The inputs are a representative set of historical daily closing prices, and 
15 the output is a closing price prediction. The following are the steps. 



1. Retrieve a representative set of historical daily closing prices e.g. 3 years of US 
Dollar / Japanese Yen daily closing prices. 

2. Arrange these into a training set (T) consisting of input vectors and targets. 

20 3. Set the invention's input parameters i.e. maximum number of epochs (£), 
number of stages (5), number of networks per stage (E), 

4. Trigger the invention by calling the NeuralDVB (E, S, B, T) function. 

5. The invention outputs an optimal set of weights (parameters) that can be used to 
generate a prediction for a target (i.e. US Dollar / Japanese Yen closing price) 

25 when presented with an input vector. 



The modelling system's neural network ensemble is trained to have bias and variance 
close to zero. It thus yields a higher quality output - in this instance a prediction for 
a US Dollar / Japanese Yen daily closing price that better reflects the true price on 
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this day. The outputs may be used as decision support indicators for individual 
traders or as inputs to a systematic trading platform. 

The modelling system is trained in a method comprising the steps of building stages 
5 of bagged ensembles and after each stage the training set is adapted so that bias is 
identified and compensated for. Training continues until the addition of a new stage 
does not improve generalization performance. Variance is corrected for at each stage 
by the bagging. 



10 A good estimate of generalization performance is required at each stage. Training 
sets in a bagged ensemble are generated by sampling with replacement from the 
original training set. Let us call this training set T. The probability a training example 
-from T will not be part of a bootstrap re-sampled training set is approximately 
(l - 1 / Nf « 0.368 , where N is the number of training examples in T. This means that 

15 approximately 37% of the original training examples in T will not be used for 
training i.e. they will be out-of-sample. These are used to estimate generalisation 
error at each stage. The method is described in detail below: 

FUNCTION NeuralDVB (£; 5, B, T) 

20 RETURNS W // Weights for a full set of networks across aU stages 

INPUTS E // Maximum number of epochs 

S // Maximum number of stages 

B // Maximum number of networks in an ensemble 

T // Original training set 

25 . //It consists of a set of input vectors and targets i.e. 

T = ((/„.x„)L 



// Calculate the number of training vectors in training set 
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N^\T\ 

// Set-Up initial bootstrap training sets 
FOR b=l TO B 
5 t; <- Sample with replacement iV times from T 

ENDFOR 

T*. ^{t;,...,t;} 

10 // Loop to build stages 
REPEAT 

// Build a single stage. Note that variance is corrected for at each stage by the 
bagging 

W,;:- ^ TrainStage (N, E, B, T*, ) 

// Propagate training vectors through ensemble just trained and 
// store the ensemble responses for each training vector 
M ^ PropStage (iV; 5, T* , W ; ) 

// Combine ensembles across all stages trained so far, calculate the 
generalisation error 

// and if there is no significant improvement set finished to true 
finished <^ CombineStages (N, s, M , T , olderr) 




//If finished is still false then adapt training set and continue 
IF (finished = FALSE) 

-T^„ <~ AdaptSet (N, s, M , T* J 
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// Increment stage 

S't-S + l 

5 WHILE {(finished = FALSE) AND (s <= S)) 
// Set-up return matrix 

w<-{w,j:- w-'} 



10 RETURN (W) 



FUNCTION TrainStage {N, E, S, B, T*J 

RETURNS w;;- / / Optimal set of weights for this stage 

15 INPUTS N,E,S,B // As above 

T*, // Set of bootstrap training set for this stage 

/ / Copy training sets for this stage into individual sets 

{t; t;}^t: 

20 

// Compute ensemble generalisation error estimates for each training 
//example. Note that rl is a variable that indicates whether training 
//example n is out-of-sample for bootstrap training set t; or not; yl = i if it 
//is and y'^ = o if it is not. Also, note that <*(x„;wj' ) is used to represent the 
25 //response of a neural network, given input vector x„ and weights trained 

//for e epochs using training set X 
FOR n=l TON 

FOR e=] TO E 
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Compute: 



t.- 



T y" 



ENDFOR 
ENDFOR 

5 

// Aggregate the ensemble generalisation error estimates for each training // 
example to produce an estimate for the average ensemble generalisation // 
error 

FOR e^l TO E 

10 Compute: 

ENDFOR 



// Find the best value for e for each network in the ensemble 
15 Compute: 

// Find the optimal value for e i.e. the value for e that 
//minimises the average ensemble generalisation error. The 
//corresponding networks are chosen as the optimal set for 
^ / / the ensemble. 

20 ^argmin(Aj 

RETURN (w,;;) 



25 FUNCTION PropStage {N, s, T * , w;: ) 
RETURNS M 
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INPUT Hs.T*,p\iV^- //As above 

// Compute ensemble (bagged) outputs for each training example for this 
//stage. These will be used later to adapt the training set 
5 F0Rn=/T0 7y^ 

Compute: 

ENDFOR 
j 10 RETURN (M) 



FUNCTION CombineStages {N, s, IM , T , olden) 
15 RETURNS finished 

INPUT N, s, M , T , oldetr //As above 

// Set new variable as upper bound on number of stages so far 
numstages <— 5 

// Sum ensemble outputs across stages 
FOR n=l TO N 

ENDFOR 

// Calculate staged ensemble generalisation error 

newerror <- — ^.(r„ - S, )' 



20 



25 



-10- 



// If no improvement finish training 



IF5 = / 



olderr newerror 
ELSE IF {newerror >{5* olderr)) 

// S isdi tuning variable that sets the strictness of the stopping condition 
// for adding new stages 

finished <- 1 
ELSE IF {newerror < olderr) 

olderr <— newerror 



RETURN {finished) 



FUNCTION AdaptSet(A^, s, M,J\) 



RETURNS T* 



INPUT 



N, s, M , T*, //As above 



// Set new variable as upper bound on number of stages so far 
numstages <- 5 



/ / Sum ensemble outputs across stages 



FOR«=7 TON 




ENDFOR 



// Adapt training set 
FOR«=/ TON 
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ENDFOR 
RETURN (T*,,,) 

Note that NeuralD VB returns a set of weights ( W ) for a neural network ensemble 
that has a bias and variance close to zero. These weights (which are simply floating 
point numbers) can be used to generate a prediction for any input vector drawn from 
the same probability distribution as the training set input vectors. 

It will be appreciated that the invention provides the following improvements over 
the art: 

1. It explicitly corrects for bias and variance in neural networks. 

2. It corrects for sources of bias that are difficult to detect and are not reflected in the 
average mean-squared generalisation error. For example, some time-series data 
such as. financial data can have a dominant directional bias. This is -problematic 
as it can cause neural network models to be built that perform well based on the 
average mean-squared error but poorly when predicting a directional change that 
is not well represented in the training data. The invention automatically corrects 
for this bias (along with usual sources of bias) despite it not being reflected in the 
average mean-squared generalisation error 

3. It uses an early-stopping based method to estimate average ensemble 
generalisation error. Good estimates of generalisation performance are critical to 
success. 

The invention is not limited to the embodiments described but may be varied in 
construction and detail. 
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Claims 

1. A method of training a modelling system neural network, the method 
comprising the steps of: 

building stages of bagged ensembles having training sets; and 

after each stage adapting the training sets so that bias is identified and 
compensated for. 

2. A method as claimed in claim 1, wherein the method comprises the further 
steps of estimating generalisation performance at each stage. 

3. A method as claimed in claims 1 or 2, wherein the training sets are generated 
by sampling with replacement from an original training set. 

4. A method as claimed in any preceding claim, wherein the step of building 
stages comprises :- 

calculating the number of training vectors in a training set; and 

setting up initial bootstrap training sets. 

5. A method as claimed in any preceding claim, wherein the step of adapting the 
training set comprises: - 

propagating training vectors through a trained ensemble; and 



storing the ensemble response for each training vector. 
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A method as claimed in claim 5, wherein said step comprises the further steps 
of:- 

combining ensembles across all stages trained so far, 
calculating the generalisation error; and 

continuing training while there are significable improvements in the 
generalisation error. 

A modelling system neural network training method substantially as 
hereinbefore described. 

A modelling system comprising a neural network whenever trained in a 
method as claimed in any preceding claim. 

A computer program product comprising sofm'are code for performing the 
steps of any of claims 1 to 7 when executing on a digital computer. 
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